N 1 N 1 + N 2. Pr(I = I 0 ) = ˆπ(A) π(a) Pr(I A Ē) + Pr(E) π(a) Ω + δ A

Size: px
Start display at page:

Download "N 1 N 1 + N 2. Pr(I = I 0 ) = ˆπ(A) π(a) Pr(I A Ē) + Pr(E) π(a) Ω + δ A"

Transcription

1 8 CHAPTER 1. SAMPLING AND COUNTING Thus Pr(I = ) 2/3 as required. (d) This is clearly true if V =. If V and v = maxv I 0 then, by induction Pr(I = I 0 ) = and similarly Pr(I = I 0 ) = φ if v / I 0. N 1 N 1 + N 2 φ N 1 + N 2 N 1 (e) Let E denote the event that some output of approxcount is bad in the iteration that produces output. Then for A Ω, = φ ˆπ(A) π(a) Pr(I A Ē) + Pr(E) π(a) A Ω + δ A Ω δ. We have therefore shown that by running Ugenx for constant expected number of times, we will with probability at least 1 δ output a randomly chosen independent set. The expected running time of Ugen is clearly as given in (1.11) which is small enough to make it a good sampler. Having dealt with a specific example we see how to put the above ideas into a formal framework. Before doing this we enumerate some basic facts about Markov Chains. 1.3 Markov Chains Throughout N = {0, 1, 2,...}, N + = N \ {0}, Q + = {q Q : q > 0}, and [n] = {1, 2,...,n} for n N +. A Markov chain M on the finite state space Ω, with transition matrix P is a sequence of random variables X t, t = 0, 1, 2,..., which satisfy Pr(X t = σ X t 1 = ω,x t 2,...,X 0 ) = P(ω,σ) (t = 1, 2,...), We sometimes write P ω σ. The value of X t is referred to as the state of M at time t. Consider the digraph D M = (Ω,A) where A = {(σ,ω) Ω Ω : P(σ,ω) > 0}. We will by and large be concerned with chains that satisfy the following assumptions: M1 The digraph D M is strongly connected. M2 gcd{ C : C is a directed cycle of D M } = 1

2 1.3. MARKOV CHAINS 9 Under these assumptions, M is ergodic and therefore has a unique stationary distribution π i.e. lim Pr(X t = ω X 0 = σ) = π(ω) (1.12) t i.e. the limit does not depend on the starting state X 0. Furthermore, π is the unique left eigen-vector of P with eigenvalue 1 i.e. satisfying P T π = π. (1.13) Another useful fact is that if τ σ denotes the expected number of steps between successive visits to state σ then τ σ = 1 π(σ). (1.14) In most cases of interest, M is reversible, i.e. Q(ω,σ) = π(ω)p(ω,σ) = π(σ)p(σ,ω) ( ω,σ Ω). (1.15) The central role of reversible chains in applications rests on the fact that π can be deduced from (1.15). If µ : Ω R satisfies (1.15), then it determines π up to normalization. Indeed, if (1.15) holds and ω Ω π(ω) = 1 then = ω Ωπ(ω)P(ω,σ) π(σ)p(σ,ω) = π(σ) ω Ω which proves that π is a left eigenvector with eigenvalue 1. In fact, we often design the chain to satisfy (1.15). Without reversibility, there is no apparent method of determining π, other than to explicitly construct the transition matrix, an exponential time (and space) computation in our setting. As a canonical example of a reversible chain we have a random walk on a graph. A random walk on the undirected graph G = (V, E) is a Markov chain with state space V associated with a particle that moves from vertex to vertex according to the following rule: the probability of a transition from vertex i, of degree d i, to vertex j is 1 d i if {i,j} E, and 0 otherwise. Its stationary distribution is given by π(v) = d v 2 E v V. (1.16) To see this note that Q(v,w) = Q(w,v) if v,w are not adjacent and otherwise Q(v,w) = 1 2 E = Q(w,v), verifying the detailed balance equations (1.15). Note that if G is a regular graph then the steady state is uniform over V.

3 10 CHAPTER 1. SAMPLING AND COUNTING If G is bipartite then the walk as described is not ergodic. This is because all cycles are of even length. This is usually handled by adding d v loops to vertex v for each vertex v. (Each loop counts as a single exit from v.) The net effect of this is to make the particle stay put with probability 1 at each step. The steady state is unaffected. The chain is 2 now lazy. A chain is lazy if P(ω,ω) 1 2 for all ω Ω. If p 0 (ω) = Pr(X 0 = ω), then p t (σ) = ω p 0(ω)P t (ω,σ) is the distribution at time t. As a measure of convergence, the natural choice in this context is variation distance. The mixing time of the chain is then τ(ε) = max p 0 min{d tv (p t,π) ε}, t and it is easy to show that the maximum occurs when X 0 = ω 0, with probability one, for some state ω 0. This is because D tv (p t,π) is a convex function of p 0 and so the maximum of D tv (p t,π) occurs at an extreme point of the set of probabilities p 0. I think this should be moved to the next chapter We now provide a simple lemma which indicates that variation distance D tv (p t,π) goes to zero exponentially. We define several related quantities: p (i) t denotes the t-fold distribution, conditional on X 0 = i. d i (t) = D tv (p (i) t,π), d(t) = maxd i (t), d(t) = max D tv (p (i) t,p (j) t ). i i,j Lemma For all s, t 0, (a) d(s + t) d(s) d(t). (b) d(s + t) 2d(s)d(t). (c) d(s) 2 d(s). (d) d(s) d(t) for s t. Proof We will use the characterisation of variation distance as D tv (µ 1,µ 2 ) = minpr(x 1 X 2 ) (1.17) where the minimum is taken over pairs of random variables X 1,X 2 such that X i has distribution µ i,i = 1, 2. Fix states i 1,i 2 and times s,t and let Y 1,Y 2 denote the chains started at i 1,i 2 respectively. By (1.17) we can construct a joint distribution for (Y 1 ) such that Pr(Y 1 s Y 2 s ) = D tv (p (i 1) s,p (i 2) s s,y s 2 ) d(s).

4 1.4. A FORMAL COMPUTATIONAL FRAMEWORK 11 Now for each pair j 1,j 2 we can use (1.17) to construct a joint distribution for (Y 1 such that Pr(Ys+t 1 Ys+t 2 Ys 1 = j 1,Ys 2 = j 2 ) = D tv (p (j 1) t,p (j 2) t ). The RHS is 0 if j 1 = j 2 and otherwise at most d(t). So, unconditionally, Pr(Y 1 s+t Y 2 s+t) and (1.17) establishes part (a) of the lemma. d(s) d(t) For part (b), the same argument, with Y 2 now being the stationary chain shows s+t,ys+t) 2 d(s + t) d(s) d(t) (1.18) and so (b) will follow from (c), which follows from the triangular inequality for variation distance. Finally note that (d) follows from (1.18). We will for the most part use carefully defined Markov chains as our good samplers. As an example, we now define a simple chain with state space Ω equal to the collection of independent sets of a graph G. The chain is ergodic and its steady state is uniform over Ω. So, running the chain for sufficiently long will produce a near uniformly chosen independent set, see (1.12). Unfortunately, this chain does not have a small enough mixing time for this to qualify as a good sampler, unless (G) 4. We define the chain as follows: suppose X t = I. Then we choose a vertex v of G uniformly at random. If v I then we put X t+1 = I \ {v}. If v / I and I {v} is an indepedent set then we put X t+1 = I {v}. Otherwise we let X t+1 = X t = I. Thus the transition matrix can be described as follows: n = V and I,J are independent sets of G. { 1 P(I,J) = I J = 1 n 0 otherwise Here I J denotes the symmetric difference (I \ J) (J \ I). The chain satisfies M1 and M2: In D M every vertex can reach and is reachable from, implying M1 holds. Also, D M contains loops unless G has no edges. In both cases M2 holds trivially. Note finally that P(I,J) = P(J,I) and so (1.15) holds with π(i) = 1. Thus the chain Ω is reversible and the steady state is uniform. 1.4 A formal computational framework The sample spaces we have in mind are sets of combinatorial objects. However, in order to discuss the computational complexity of generation, it is necessary to consider a sequence of instances of increasing size. We therefore work within the following formal

5 CHAPTER 3 Markov Chain Monte Carlo: Metropolis and Glauber Chains 3.1. Introduction Given an irreducible transition matrix P, there is a unique stationary distribution π satisfying π = πp, which we constructed in Section 1.5. We now consider the inverse problem: given a probability distribution π on X, can we find a transition matrix P for which π is its stationary distribution? The following example illustrates why this is a natural problem to consider. A random sample from a finite set X will mean a random uniform selection from X, i.e., one such that each element has the same chance 1/ X of being chosen. Fix a set {1, 2,..., q} of colors. A proper q-coloring of a graph G = (V, E) is an assignment of colors to the vertices V, subject to the constraint that neighboring vertices do not receive the same color. There are (at least) two reasons to look for an efficient method to sample from X, the set of all proper q-colorings. If a random sample can be produced, then the size of X can be estimated (as we discuss in detail in Section ). Also, if it is possible to sample from X, then average characteristics of colorings can be studied via simulation. For some graphs, e.g. trees, there are simple recursive methods for generating a random proper coloring (see Example 14.12). However, for other graphs it can be challenging to directly construct a random sample. One approach is to use Markov chains to sample: suppose that (X t ) is a chain with state space X and with stationary distribution uniform on X (in Section 3.3, we will construct one such chain). By the Convergence Theorem (Theorem 4.9, whose proof we have not yet given but have often foreshadowed), X t is approximately uniformly distributed when t is large. This method of sampling from a given probability distribution is called Markov chain Monte Carlo. Suppose π is a probability distribution on X. If a Markov chain (X t ) with stationary distribution π can be constructed, then, for t large enough, the distribution of X t is close to π. The focus of this book is to determine how large t must be to obtain a sufficiently close approximation. In this chapter we will focus on the task of finding chains with a given stationary distribution Metropolis Chains Given some chain with state space X and an arbitrary stationary distribution, can the chain be modified so that the new chain has the stationary distribution π? The Metropolis algorithm accomplishes this Symmetric base chain. Suppose that Ψ is a symmetric transition matrix. In this case, Ψ is reversible with respect to the uniform distribution on X. 38

6 3.2. METROPOLIS CHAINS 39 We now show how to modify transitions made according to Ψ to obtain a chain with stationary distribution π, given an arbitrary probability distribution π on X. The new chain evolves as follows: when at state x, a candidate move is generated from the distribution Ψ(x, ). If the proposed new state is y, then the move is censored with probability 1 a(x, y). That is, with probability a(x, y), the state y is accepted so that the next state of the chain is y, and with the remaining probability 1 a(x, y), the chain remains at x. Rejecting moves slows the chain and can reduce its computational efficiency but may be necessary to achieve a specific stationary distribution. We will discuss how to choose the acceptance probability a(x, y) below, but for now observe that the transition matrix P of the new chain is Ψ(x, y)a(x, y) if y x, P (x, y) = 1 Ψ(x, z)a(x, z) if y = x. z : z x By Proposition 1.20, the transition matrix P has stationary distribution π if π(x)ψ(x, y)a(x, y) = π(y)ψ(y, x)a(y, x) (3.1) for all x y. Since we have assumed Ψ is symmetric, equation (3.1) holds if and only if b(x, y) = b(y, x), (3.2) where b(x, y) = π(x)a(x, y). Because a(x, y) is a probability and must satisfy a(x, y) 1, the function b must obey the constraints b(x, y) π(x), b(x, y) = b(y, x) π(y). (3.3) Since rejecting the moves of the original chain Ψ is wasteful, a solution b to (3.2) and (3.3) should be chosen which is as large as possible. Clearly, all solutions are bounded above by b (x, y) := π(x) π(y) := min{π(x), π(y)}. For this choice, the acceptance probability a(x, y) is equal to (π(y)/π(x)) 1. The Metropolis chain for a probability π and a symmetric transition matrix Ψ is defined as [ ] Ψ(x, y) 1 π(y) π(x) if y x, P (x, y) = 1 [ ] z : z x Ψ(x, z) 1 π(z) if y = x. Our discussion above shows that π is indeed a stationary distribution for the Metropolis chain. Remark 3.1. A very important feature of the Metropolis chain is that it only depends on the ratios π(x)/π(y). In many cases of interest, π(x) has the form h(x)/z, where the function h : X [0, ) is known and Z = h(x) is a normalizing constant. It may be difficult to explicitly compute Z, especially if X is large. Because the Metropolis chain only depends on h(x)/h(y), it is not necessary to compute the constant Z in order to simulate the chain. The optimization chains described below (Example 3.2) are examples of this type. Example 3.2 (Optimization). Let f be a real-valued function defined on the vertex set X of a graph. In many applications it is desirable to find a vertex x where f(x) is maximal. If the domain X is very large, then an exhaustive search may be too expensive. π(x)

7 40 3. MARKOV CHAIN MONTE CARLO: METROPOLIS AND GLAUBER CHAINS f(x) Figure 3.1. A hill climb algorithm may become trapped at a local maximum. x 0 A hill climb is an algorithm which attempts to locate the maximum values of f as follows: when at x, if there is at least one neighbor y of x satisfying f(y) > f(x), move to a neighbor with the largest value of f. The climber may become stranded at local maxima see Figure 3.1. One solution is to randomize moves so that instead of always remaining at a local maximum, with some probability the climber moves to lower states. Suppose for simplicity that X is a regular graph, so that simple random walk on X has a symmetric transition matrix. Fix λ 1 and define π λ (x) = λf(x) Z(λ), where Z(λ) := λf(x) is the normalizing constant that makes π λ a probability measure (as mentioned in Remark 3.1, running the Metropolis chain does not require computation of Z(λ), which may be prohibitively expensive to compute). Since π λ (x) is increasing in f(x), the measure π λ favors vertices x for which f(x) is large. If f(y) < f(x), the Metropolis chain accepts a transition x y with probability λ [f(x) f(y)]. As λ, the chain more closely resembles the deterministic hill climb. Define { } X := x X : f(x) = f := max f(y). y X Then lim π λ(x) = lim λ λ λ f(x) /λ f X + = 1 { } \X λf(x) /λ f X. That is, as λ, the stationary distribution π λ of this Metropolis chain converges to the uniform distribution over the global maxima of f General base chain. The Metropolis chain can also be defined when the initial transition matrix is not symmetric. For a general (irreducible) transition matrix Ψ and an arbitrary probability distribution π on X, the Metropolized chain is executed as follows. When at state x, generate a state y from Ψ(x, ). Move to

8 3.3. GLAUBER DYNAMICS 41 y with probability π(y)ψ(y, x) 1, (3.4) π(x)ψ(x, y) and remain at x with the complementary probability. The transition matrix P for this chain is [ ] Ψ(x, y) π(y)ψ(y,x) π(x)ψ(x,y) 1 if y x, P (x, y) = 1 [ ] Ψ(x, z) π(z)ψ(z,x) π(x)ψ(x,z) 1 (3.5) if y = x. z : z x The reader should check that the transition matrix (3.5) defines a reversible Markov chain with stationary distribution π (see Exercise 3.1). Example 3.3. Suppose you know neither the vertex set V nor the edge set E of a graph G. However, you are able to perform a simple random walk on G. (Many computer and social networks have this form; each vertex knows who its neighbors are, but not the global structure of the graph.) If the graph is not regular, then the stationary distribution is not uniform, so the distribution of the walk will not converge to uniform. You desire a uniform sample from V. We can use the Metropolis algorithm to modify the simple random walk and ensure a uniform stationary distribution. The acceptance probability in (3.4) reduces in this case to deg(x) deg(y) 1. This biases the walk against moving to higher degree vertices, giving a uniform stationary distribution. Note that it is not necessary to know the size of the vertex set to perform this modification, which can be an important consideration in applications Glauber Dynamics We will study many chains whose state spaces are contained in a set of the form S V, where V is the vertex set of a graph and S is a finite set. The elements of S V, called configurations, are the functions from V to S. We visualize a configuration as a labeling of vertices with elements of S. Given a probability distribution π on a space of configurations, the Glauber dynamics for π, to be defined below, is a Markov chain which has stationary distribution π. This chain is often called the Gibbs sampler, especially in statistical contexts Two examples. As we defined in Section 3.1, a proper q-coloring of a graph G = (V, E) is an element x of {1, 2,..., q} V, the set of functions from V to {1, 2,..., q}, such that x(v) x(w) for all edges {v, w}. We construct here a Markov chain on the set of proper q-colorings of G. For a given configuration x and a vertex v, call a color j allowable at v if j is different from all colors assigned to neighbors of v. That is, a color is allowable at v if it does not belong to the set {x(w) : w v}. Given a proper q-coloring x, we can generate a new coloring by selecting a vertex v V at random, selecting a color j uniformly at random from the allowable colors at v, and

9 4.2. COUPLING AND TOTAL VARIATION DISTANCE 49 which is a useful identity. Remark 4.4. From Proposition 4.2 and the triangle inequality for real numbers, it is easy to see that total variation distance satisfies the triangle inequality: for probability distributions µ, ν and η, µ ν TV µ η TV + η ν TV. (4.6) Proposition 4.5. Let µ and ν be two probability distributions on X. Then the total variation distance between them satisfies } µ ν TV = 1 2 sup { f(x)µ(x) f(x)ν(x) : max f(x) 1 Proof. If max f(x) 1, then 1 f(x)µ(x) f(x)ν(x) 2 1 µ(x) ν(x) = µ ν 2 TV. Thus, the right-hand side of (4.7) is at most µ ν TV. For the other direction, define { f 1 if µ(x) ν(x), (x) = 1 if µ(x) < ν(x). Then [ 1 f (x)µ(x) ] f (x)ν(x) = 1 f (x)[µ(x) ν(x)] 2 2 = 1 2 µ(x) ν(x) [µ(x) ν(x)] + ν(x)>µ(x) Using (4.5) shows that the right-hand side above equals µ ν TV. right-hand side of (4.7) is at least µ ν TV Coupling and Total Variation Distance. (4.7) [ν(x) µ(x)]. Hence the A coupling of two probability distributions µ and ν is a pair of random variables (X, Y ) defined on a single probability space such that the marginal distribution of X is µ and the marginal distribution of Y is ν. That is, a coupling (X, Y ) satisfies P{X = x} = µ(x) and P{Y = y} = ν(y). Coupling is a general and powerful technique; it can be applied in many different ways. Indeed, Chapters 5 and 14 use couplings of entire chain trajectories to bound rates of convergence to stationarity. Here, we offer a gentle introduction by showing the close connection between couplings of two random variables and the total variation distance between those variables. Example 4.6. Let µ and ν both be the fair coin measure giving weight 1/2 to the elements of {0, 1}. (i) One way to couple µ and ν is to define (X, Y ) to be a pair of independent coins, so that P{X = x, Y = y} = 1/4 for all x, y {0, 1}.

10 50 4. INTRODUCTION TO MARKOV CHAIN MIXING (ii) Another way to couple µ and ν is to let X be a fair coin toss and define Y = X. In this case, P{X = Y = 0} = 1/2, P{X = Y = 1} = 1/2, and P{X Y } = 0. Given a coupling (X, Y ) of µ and ν, if q is the joint distribution of (X, Y ) on X X, meaning that q(x, y) = P{X = x, Y = y}, then q satisfies q(x, y) = P{X = x, Y = y} = P{X = x} = µ(x) y X and y X q(x, y) = P{X = x, Y = y} = P{Y = y} = ν(y). Conversely, given a probability distribution q on the product space X X which satisfies q(x, y) = µ(x) and q(x, y) = ν(y), y X there is a pair of random variables (X, Y ) having q as their joint distribution and consequently this pair (X, Y ) is a coupling of µ and ν. In summary, a coupling can be specified either by a pair of random variables (X, Y ) defined on a common probability space or by a distribution q on X X. Returning to Example 4.6, the coupling in part (i) could equivalently be specified by the probability distribution q 1 on {0, 1} 2 given by q 1 (x, y) = 1 4 for all (x, y) {0, 1} 2. Likewise, the coupling in part (ii) can be identified with the probability distribution q 2 given by { 1 q 2 (x, y) = 2 if (x, y) = (0, 0), (x, y) = (1, 1), 0 if (x, y) = (0, 1), (x, y) = (1, 0). Any two distributions µ and ν have an independent coupling. However, when µ and ν are not identical, it will not be possible for X and Y to always have the same value. How close can a coupling get to having X and Y identical? Total variation distance gives the answer. Proposition 4.7. Let µ and ν be two probability distributions on X. Then µ ν TV = inf {P{X Y } : (X, Y ) is a coupling of µ and ν}. (4.8) Remark 4.8. We will in fact show that there is a coupling (X, Y ) which attains the infimum in (4.8). We will call such a coupling optimal. Proof. First, we note that for any coupling (X, Y ) of µ and ν and any event A X, µ(a) ν(a) = P{X A} P{Y A} (4.9) P{X A, Y A} (4.10) P{X Y }. (4.11) (Dropping the event {X A, Y A} from the second term of the difference gives the first inequality.) It immediately follows that µ ν TV inf {P{X Y } : (X, Y ) is a coupling of µ and ν}. (4.12)

11 4.2. COUPLING AND TOTAL VARIATION DISTANCE 51 Μ I III II Ν Figure 4.2. Since each of regions I and II has area µ ν TV and µ and ν are probability measures, region III has area 1 µ ν TV. It will suffice to construct a coupling for which P{X Y } is exactly equal to µ ν TV. We will do so by forcing X and Y to be equal as often as they possibly can be. Consider Figure 4.2. Region III, bounded by µ(x) ν(x) = min{µ(x), ν(x)}, can be seen as the overlap between the two distributions. Informally, our coupling proceeds by choosing a point in the union of regions I and III, and setting X to be the x-coordinate of this point. If the point is in III, we set Y = X and if it is in I, then we choose independently a point at random from region II, and set Y to be the x-coordinate of the newly selected point. In the second scenario, X Y, since the two regions are disjoint. More formally, we use the following procedure to generate X and Y. Let p = µ(x) ν(x). Write µ(x) ν(x) =, µ(x) ν(x) µ(x) +, µ(x)>ν(x) ν(x). Adding and subtracting x : µ(x)>ν(x) µ(x) to the right-hand side above shows that µ(x) ν(x) = 1 [µ(x) ν(x)]., µ(x)>ν(x) By equation (4.5) and the immediately preceding equation, µ(x) ν(x) = 1 µ ν TV = p. (4.13) Flip a coin with probability of heads equal to p. (i) If the coin comes up heads, then choose a value Z according to the probability distribution µ(x) ν(x) γ III (x) =, p and set X = Y = Z.

12 52 4. INTRODUCTION TO MARKOV CHAIN MIXING (ii) If the coin comes up tails, choose X according to the probability distribution γ I (x) = { µ(x) ν(x) µ ν TV if µ(x) > ν(x), 0 otherwise, and independently choose Y according to the probability distribution γ II (x) = { ν(x) µ(x) µ ν TV if ν(x) > µ(x), 0 otherwise. Note that (4.5) ensures that γ I and γ II are probability distributions. Clearly, pγ III + (1 p)γ I = µ, pγ III + (1 p)γ II = ν, so that the distribution of X is µ and the distribution of Y is ν. Note that in the case that the coin lands tails up, X Y since γ I and γ II are positive on disjoint subsets of X. Thus X = Y if and only if the coin toss is heads. We conclude that P{X Y } = µ ν TV The Convergence Theorem We are now ready to prove that irreducible, aperiodic Markov chains converge to their stationary distributions a key step, as much of the rest of the book will be devoted to estimating the rate at which this convergence occurs. The assumption of aperiodicity is indeed necessary recall the even n-cycle of Example 1.4. As is often true of such fundamental facts, there are many proofs of the Convergence Theorem. The one given here decomposes the chain into a mixture of repeated independent sampling from the stationary distribution and another Markov chain. See Exercise 5.1 for another proof using two coupled copies of the chain. Theorem 4.9 (Convergence Theorem). Suppose that P is irreducible and aperiodic, with stationary distribution π. Then there exist constants α (0, 1) and C > 0 such that max P t (x, ) π TV Cα t. (4.14) Proof. Since P is irreducible and aperiodic, by Proposition 1.7 there exists an r such that P r has strictly positive entries. Let Π be the matrix with X rows, each of which is the row vector π. For sufficiently small δ > 0, we have P r (x, y) δπ(y) for all x, y X. Let θ = 1 δ. The equation P r = (1 θ)π + θq (4.15) defines a stochastic matrix Q. It is a straightforward computation to check that MΠ = Π for any stochastic matrix M and that ΠM = Π for any matrix M such that πm = π. Next, we use induction to demonstrate that P rk = ( 1 θ k) Π + θ k Q k (4.16)

13 30 CHAPTER 2. BOUNDING THE MIXING TIME i.e., we change the ith component from x i to y i. Note that some of the edges may be loops (if x i = y i ). To compute, fix attention on a particular (oriented) edge t = (w,w ) = ( (w 0,...,w i,...w n 1 ), (w 0,...,w i,...w n 1 ) ), and consider the number of canonical paths γ xy that include t. The number of possible choices for x is 2 i, as the final n i positions are determined by x j = w j, for j i; and by a similar argument the number of possible choices for y is 2 n i 1. Thus the total number of canonical paths using a particular edge t is 2 n 1 ; furthermore, Q(w,w ) = π(w)p(w,w ) 2 n (2n) 1, and the length of every canonical path is exactly n. Plugging all these bounds into the definition of ρ yields ρ n 2. Thus, by Theorem 2.2.4, the mixing time of W n is τ(ε) n 2 (n ln q + ln ε 1 ) Comparison Theorems Decomposition Theorem 2.3 Coupling A coupling C(M) for M is a stochastic process (X t,y t ) on Ω Ω such that each of X t, Y t is marginally a copy of M, Pr(X t = σ 1 X t 1 = ω 1 ) = P(ω 1,σ 1 ), Pr(Y t = σ 2 Y t 1 = ω 2 ) = P(ω 2,σ 2 ), ( t > 0). (2.18) The following simple but powerful inequality then follows easily from these definitions. Lemma (Coupling Lemma) Let X t,y t be a coupling for M such that Y 0 has the stationary distribution π. Then, if X t has distribution p t, D tv (p t,π) Pr(X t Y t ). (2.19) Proof Suppose A t Ω maximizes in (1.3). Then, since Y t has distribution π, D tv (p t,π) = Pr(X t A t ) Pr(Y t A t ) Pr(X t A t,y t / A t ) Pr(X t Y t ). It is important to remember that the Markov chain Y t is simply a proof construct, and X t the chain we actually observe. We also require that X t = Y t implies X t+1 = Y t+1,

14 2.3. COUPLING 31 since this makes the right side of (2.19) nonincreasing. Then the earliest epoch T at which X T = Y T is called coalescence, making T a random variable. A successful coupling is such that lim t Pr(X t Y t ) = 0. Clearly we are only interested in successful couplings. As an example consider our random walk on the cube Q n. We can define a coupling as follows: Given (X t,y t ) we (a) Choose i uniformly at random from [n]. (b) Put X t+1,j = X t,j and Y t+1,j = Y t,j for j i. (c) If X t,i = Y t,i then (d) otherwise X t,i prob 1 2 X t+1,i = Y t+1,i = 1 X t,i prob 1 2 (X t,i, 1 Y t,i ) prob 1 2 (X t+1,i,y t+1,i ) = (1 X t,i,y t,i ) prob 1 2 It should hopefully be clear that this is a coupling i.e. the marginals are correct and X t = Y t implies X t+1 = Y t+1. Now let I t = {j : i is chosen in (a) of steps 1, 2,...,t. Then I t = [n] implies that X τ = Y τ for τ t. So Pr(X t Y t ) Pr(I t [n]) = Pr(Īt ) E( Īt ) = n ( 1 1 n) t. So if t = n(log n + log ǫ 1 ) we have d TV (p t,π) ǫ. A coupling is a Markovian coupling if the process C(M) is a Markov chain on Ω Ω. There always exists a maximal coupling, which gives equality in (2.19). This maximal coupling is in general non-markovian, and is seemingly not constructible without knowing p t (t = 1, 2,...). But coupling has little algorithmic value if we already know p t. More generally, it seems difficult to prove mixing properties of non-markovian couplings in our setting. Therefore we restrict attention to Markovian couplings, at the (probable) cost of sacrificing equality in (2.19). Let C(M) be a Markovian coupling, with Q its transition matrix, i.e. the probability of a joint transition from (ω 1,ω 2 ) to (σ 1,σ 2 ) is Q ω 1ω 2 σ 1 σ 2. The precise conditions required of

15 32 CHAPTER 2. BOUNDING THE MIXING TIME Q are then Q ω ω σ 1 σ 2 0 implies σ 1 = σ 2 ( ω Ω), (2.20) Q ω 1ω 2 σ 1 σ 2 = P ω 1 σ 1 ( ω 2 Ω), Q ω 1ω 2 σ 1 σ 2 = P ω 2 σ 2 ( ω 1 Ω). (2.21) σ 2 Ω Here (2.20) implies equality after coalescence, and (2.21) implies the marginals are copies of M. Our goal is to design Q so that Pr(X t Y t ) quickly becomes small. We need only specify Q to satisfy (2.21) for ω 1 ω 2. The other entries are completely determined by (2.20) and (2.21). In general, to prove rapid mixing using coupling, it is usual to map C(M) to a process on N by defining a function ψ : Ω Ω N such that ψ(ω 1,ω 2 ) = 0 implies ω 1 = ω 2. We call this a proximity function. Then Pr(X t Y t ) E(ψ(X t,y t )), by Markov s inequality, and we need only show that E(ψ(X t,y t )) converges quickly to zero. σ 1 Ω 2.4 Path coupling A major difficulty with coupling is that we are obliged to specify it, and show improvement in the proximity function, for every pair of states. The idea of path coupling, where applicable, can be a major saving in this respect. We describe the approach below. As a simple example of this approach consider a Markov chain where Ω S m for some set S and positive integer m. Suppose also that if ω,σ Ω and h(ω,σ) = d (Hamming distance) then there exists a sequence ω = x 0,x 1,...,x d = σ of members of Ω such that (i) {x 0,x 1,...,x d } Ω, (ii) h(x i,x i+1 ) = 1, i = 0, 1,...,d 1 and (iii) P(x i,x i+1 ) > 0. Now suppose we define a coupling of the chains (X t,y t ) only for the case where h(x t,y t ) = 1. Suppose then that for some β < 1. Then in all cases. It then follows that E(h(X t+1,y t+1 ) h(x t,y t ) = 1) β (2.22) E(h(X t+1,y t+1 )) βh(x t,y t ), (2.23) d TV (p t,π) Pr(X t Y t ) nβ t. Equation (2.23) is shown by choosing a sequence X t = Z 0,Z 1,...,Z d = Y t,d = h(x t,y t ) Z 0,Z 1,...,Z d satisfy (i),(ii),(iii) above. Then we can couple Z i and Z i+1, 1 i < d so that X t+1 = Z 0,Z 1,...,Z d = Y t+1 and (i) Pr(Z i = σ Z i = ω) = P(ω,σ) and (ii)

16 2.4. PATH COUPLING 33 E(h(Z i,z i+1)) β. Therefore E(h(X t+1,y t+1 )) d E(h(Z i,z i+1)) βd i=1 and (2.23) follows. As an example, let G = (V,E) be a graph with maximum degree and let k be an integer. Let Ω k be the set of proper k- vertex colourings of G i.e. {c : V [k]} such that (v,w) E implies c(v) c(w). We describe a chain which provides a good sampler for the uniform distribution over Ω k. We let Ω = V k be all k-colourings, including improper ones and describe a chain on Ω for which only proper colourings have a positive steady state probability. To describe a general step of the chain asume X t Ω. Then Step 1 Choose w uniformly from V and x uniformly from [k]. Step 2 Let X t+1 (v) = X t (v) for v V \ {w}. Step 3 If no neighbour of w in G has colour x then put X t+1 (w) = x, otherwise put X t+1 (w) = x. Note that P(ω,σ) = P(σ,ω) = 1 for two proper colourings which can be obtained from nk each other by a single move of the chain. It follows from (1.15) that the steady state is uniform over Ω k. We first describe a coupling which is extremely simple but needs k > 3 in order for (2.22) to be satisfied. Let h(x t,y t ) = 1 and let v 0 be the unique vertex of V such that X t (v) Y t (v). In our coupling we choose w,x as in Step 1 and try to colour w with x in both chains. We claim that E(h(X t+1,y t+1 ) 1 1 n ( 1 ) + 2 k n k = 1 k 3 kn. (2.24) and so we can take β 1 1 in (2.23) if k > 3. kn ( ) The term 1 n 1 k in (2.24) lower bounds the probability that w = v0 and that x is not used in the neighbourhood of v 0. In which case we will have X t+1 = Y t+1. Next let c X c Y be the colours of v 0 in X t,y t respectively. The term 2 in (2.24) is an n k upper bound for the probability that w is in the neighbourhood of v 0 and x {c X,c Y } and in which case we might have h(x t+1,y t+1 ) = 2. In all other cases we find that h(x t+1,y t+1 ) = h(x t,y t ) = 1.

17 34 CHAPTER 2. BOUNDING THE MIXING TIME A better coupling gives the desired result. We proceed as above except for the case where w is a neighbour of v 0 and x {c X,c Y }. In this case with probability 1 we try 2 to colour w with c X in X t and colour w with c Y in Y t, and fail in both cases. With probability 1 we try to colour w with c 2 Y in X t and colour w with c X in Y t, in which case the hamming distance may increase by one. Thus for this coupling we have E(h(X t+1,y t+1 ) 1 1 ( 1 ) n k 2 n k = 1 k 2 kn and we can take β 1 1 kn in (2.23) if k > 2. We now give a more general framework for the definition of path coupling. Recall that a quasi-metric satisfies the conditions for a metric except possibly the symmetry condition. Any metric is a quasi-metric, but a simple example of a quasi-metric which is not a metric is directed edge distance in a digraph. Suppose we have a relation S Ω Ω such that S has transitive closure Ω Ω, and suppose that we have a proximity function defined for all pairs in S, i.e. ψ : S N. Then we may lift ψ to a quasi-metric φ(ω,ω ) on Ω as follows. For each pair (ω,ω ) Ω Ω, consider the set P(ω,ω ) of all sequences ω = ω 1,ω 2,...,ω r 1,ω r = ω with (ω i,ω i+1 ) S (i = 1,...,r 1). (2.25) Then we set φ(ω,ω ) = min P(ω,ω ) r 1 ψ(ω i,ω i+1 ). (2.26) It is easy to prove that φ is a quasi-metric. We call a sequence minimizing (2.26) geodesic. We now show that, without any real loss, we may define the (Markovian) coupling only on pairs in S. Such a coupling is a called a path coupling. We give a detailed development below. Clearly S = Ω Ω is always a relation whose transitive closure is Ω Ω, but path coupling is only useful when we can define a suitable S which is much smaller than Ω Ω. A relation of particular interest is R σ from Section 1.4, but this is not always the best choice. As in Section 2.3, we use σ (or σ i ) to denote a state obtained by performing a single transition of the chain from the state ω (or ω i ). Let Pσ ω denote the probability of a transition from state ω to state σ in the Markov chain, and let Q ωω σσ denote the probability of a joint transition from (ω,ω ) to (σ,σ ), where (ω,ω ) S, as specified by the path coupling. Since this coupling has the correct marginals, we have σ Ω Q ωω σσ = P ω σ, σ Ω i=1 Q ωω σσ = P ω σ ( (ω,ω ) S). (2.27) We extend this to all pairs (ω,ω ) Ω Ω, as follows. For each pair, fix a sequence (ω 1,ω 2,...,ω r ) P(ω,ω ). We do not assume the sequence is geodesic here, or indeed

18 2.4. PATH COUPLING 35 the existence of any proximity function, but this is our eventual purpose. The implied global coupling Q ω 1ω r σ 1 σ r is then defined along this sequence by successively conditioning on the previous choice. Using (2.27), this can be written explicitly as Q ω 1ω r σ 1 σ r = σ 2 Ω σ 3 Ω σ r 1 Ω Q ω 1ω 2 σ 1 σ 2 Q ω 2ω 3 σ 2 σ 3 P ω 2 σ 2... Qωr 1ωr σ r 1 σ r. (2.28) P ω r 1 σ r 1 Summing (2.28) over σ r or σ 1, and again applying (2.27), causes the right side to successively simplify, giving Q ω 1ω r σ 1 σ r = P ω 1 σ 1 ( ω r Ω), Q ω 1ω r σ 1 σ r = P ω r σ r ( ω 1 Ω). (2.29) σ r Ω σ 1 Ω Hence the global coupling satisfies (2.21), as we would anticipate from the properties of conditional probabilities. Now suppose the global coupling is determined by geodesic sequences. We bound the expected value of φ(σ 1,σ r ). This is E(φ(σ 1,σ r )) = σ 1 σ r r 1 σ 1 σ r φ(σ 1,σ r ) Qω1ω2 σ 1 σ 2 Q ω 2ω 3 σ 2 σ 3 Q ω r 1ω r σ r 1 σ r P ω 2 σ 2 P ω r 1 σ r 1 i=1 r 1 = σ 1 i=1 σ r φ(σ i,σ i+1 ) Qω 1ω 2 σ 1 σ 2 Q ω 2ω 3 σ 2 σ 3 Q ω r 1ω r σ r 1 σ r P ω 2 σ 2 P ω r 1 σ r 1 φ(σ i,σ i+1 ) Qω1ω2 σ 1 σ 2 Q ω 2ω 3 σ 2 σ 3 Q ω r 1ω r σ r 1 σ r P ω 2 σ 2 P ω r 1 σ r 1 r 1 = φ(σ i,σ i+1 )Q ωiωi+1 σ i σ i+1, (2.30) σ i+1 i=1 σ i where we have used the triangle inequality for a quasi-metric and the same observation as that leading from (2.28) to (2.29). Suppose we can find β 1, such that, for all (ω,ω ) S, E(φ(σ,σ )) = φ(σ,σ )Q ωω σσ β φ(ω,ω ). (2.31) σ Then, from (2.30), (2.31) and (2.26) we have σ E(φ(σ 1,σ r )) r 1 β φ(ω i,ω i+1 ) = β i=1 r 1 φ(ω i,ω i+1 ) = β φ(ω 1,ω r ). (2.32) i=1 Thus we can show (2.31) for every pair, merely by showing that this holds for all pairs in S. To apply path coupling to a particular problem, we must find a relation S and

19 36 CHAPTER 2. BOUNDING THE MIXING TIME proximity function ψ so that this is possible. In particular we need φ(ω,ω ) for (ω,ω ) S to be easily deducible from ψ. Suppose that Ω has diameter D, i.e. φ(ω,ω ) D for all ω,ω Ω. Then, Pr(X t Y t ) β t D and so if β < 1 we have, since log β 1 1 β, D tv (p t,π) ε for t log(dε 1 )/(1 β). (2.33) This bound is polynomial even when D is exponential in the problem size. It is also possible to prove a bound when β = 1, provided we know the quasi-metric cannot get stuck. Specifically, we need an α > 0 (inversely polynomial in the problem size) such that, in the above notation, Pr(φ(σ,σ ) φ(ω,ω )) α ( ω,ω Ω). (2.34) Observe that it is not sufficient simply to establish (2.34) for pairs in S. However, the structure of the path coupling can usually help in proving it. In this case, we can show that D tv (p t,π) ε for t ed 2 /α ln(ε 1 ). (2.35) This is most easily shown using a martingale argument. Here we need D to be polynomial in the problem size. Consider a sequence (ω 0,ω 0 ), (ω 1,ω 1 )...,(ω t,ω t ) and define the random time T ω,ω = min {t : φ(ω t,ω t) = 0}, assuming that ω 0 = ω,ω 0 = ω. We prove that E(T ω,ω ) D 2 /α. (2.36) Let and let Then Z(t) = φ(ω t,ω t )2 2Dφ(ω t,ω t ) αt δ(t) = φ(ω t+1,ω t+1) φ(ω t,ω t). E(Z(t + 1) Z(0),Z(1),...,Z(t)) Z(t) = 2(φ(ω t,ω t) D)E(δ(t) ω t,ω t) + (E(δ(t) 2 ω t,ω t) α) 0. Hence Z(t) is a submartingale. The stopping time T ω,ω has finite expectation and Z(t + 1) Z(t) D 2. We can therefore apply the Optional Stopping Theorem for submartingales to obtain E(Z(T ω,ω )) Z(0). This implies and (2.36) follows. αe(t ω,ω ) δ(0) 2 2Dδ(0)

20 2.5. HITTING TIME LEMMAS 37 So for any ω,ω Pr(T ω,ω ed 2 /α) e 1 and by considering k consecutive time intervals of length k we obtain Pr(T ω,ω ked 2 /α) e k and (2.35) follows. 2.5 Hitting Time Lemmas For a finite Markov chain M let Pr i,e i denote probability and expectation, given that X 0 = i. For a set A Ω let Then for i j the hitting time T A = min {t 0 : X t A}. H i,j = E i (T j ) is the expected number of steps needed to get from state i to state j. The commute time C i,j = H i,j + H j,i. Lemma Assume X 0 = i and S is a stopping time with X S = i. Let j be an arbitrary state. Then E i (number of visits to state j before time S) = π j E i (S). Proof Consider the renewal process whose inter-renewal time is distributed as S. The reward-renewal theorem states that the asymptotic proportion of time spent in state j is given by E i (number of visits to j before time S)/E i (S). This also equal to π j, by the ergodic theorem. Lemma E j (number of visits to j before T i ) = π j C i,j. Proof Let S be the time of the first return to i after the first visit to j. Apply Lemma The cover time C(M) of M is max i C i (M) where C i (M) = E i (max j T j ) is the expected time to visit all states starting at i.

21 ÔØÖ ÓÙÒÒ Ø ÅÜÒ ÌÑ º½ ËÔØÖÐ Ô ÄØ È Ø ØÖÒ ØÓÒ ÑØÖÜ Ó Ò ÖÓ ÖÚÖ Ð ÅÖÓÚ Ò ÓÒ ØØ Ô Å ÄØ Ø ØØÓÒÖÝ ØÖÙØÓÒº ÄØ Æ Å Ò ÙÑ ÛºÐºÓºº ØØ Å ¼ ½ Æ ½º ÄØ Ø ÒÚÐÙ Ó È ½ ¼ ½ Æ ½ º ÌÝ Ö ÐÐ ÖÐ ÚÐÙº ÄØ ÑÜ ÑÜ ¼º Ì Ø ØØ ÑÜ ½ Ð Ð Ö ÙÐØ Ó Ø ØÓÖÝ Ó ÒÓÒ¹ÒØÚ ÑØÖ º Ì ÔØÖÐ Ô ½ ÑÜ ØÖÑÒ Ø ÑÜÒ ÖØ Ó Ø Ò Ò Ò ÒØÐ Ûݺ Ì ÐÖÖ Ø Ø ÑÓÖ ÖÔÐÝ Ó Ø Ò Ñܺ ÓÖ Í Å ÐØ Í Øµ ÑÜ Í ÌÓÖÑ º½º½ ÓÖ ÐÐ Í Å Ò Ø ¼ È Ø µ µ µ Í Ø ÑÜ ÑÒ Í µ ÄØ ½ Ø ÓÒÐ Å Å ÑØÖÜ ÛØ ÓÒÐ ÒØÖ Ô µ Å ÈÖÓÓ Ò ÐØ ½ Ø ÒÚÖ º ÌÒ Ø ÖÚÖ ÐØÝ Ó Ó Ø Ò ½º½µ ÑÔÐ ØØ Ø ÑØÖÜ Ë ½ È ½ ÝÑÑØÖº ÁØ Ø Ñ ÒÚÐÙ È Ò Ø ÝÑÑØÖÝ ÑÒ ØØ Ø Ö ÐÐ Öк Ï Ò ÐØ Ò ÓÖØÓÒÓÖÑÐ Ó ÓÐÙÑÒ ÚØÓÖ µ Å ÓÖ Ê Å ÓÒ ØÒ Ó ÐØ ÒÚØÓÖ Ó Ë ÛÖ µ ÓØ ÒÚÐÙ Ò ¼µ Ì ½ º Ë Ø ÔØÖÐ ÓÑÔÓ ØÓÒ Ë Æ ½ ¼ µ µì ½ Æ ½ ¼ µ

22 ½ ÀÈÌÊ º ÇÍÆÁÆ ÌÀ ÅÁÁÆ ÌÁÅ ÛÖ µ µ µì º ÆÓØ È ØØ µ µ ¼ ÓÖ Ò µ µ º ÁØ ÓÐÐÓÛ ØØ ÓÖ ÒÝ Ø ¼ ½ Ë Ø Æ ½ Ø µ º ÀÒ ¼ È Ø ½ Ë Ø ½ Æ ½ ¼ ½ Æ Ì Ø ½ µ µµ µì ½ µ Æ ½ ½ Ø ½ µ µµ µì ½ µ ÛÖ ½ Æ Ø Æ¹ÚØÓÖ ÐÐ Ó ÛÓ ÓÑÔÓÒÒØ Ö ½º ÁÒ ÓÑÔÓÒÒØ ÓÖÑ Û Ø ÛØ Ø ÐÔ Ó Ø ÙݹËÛÖØÞ ÒÕÙÐØÝ È Ø µ Ö Ø ÑÜ ¼ Ö Æ ½ Ø µ µ ½ Æ ½ µ Ö Ø ÑÜ ½ Æ ½ ¼ ½ µ º½µ Ì ØÓÖÑ ÓÐÐÓÛ Ý Ù ØØÙØÓÒ Ó Ø ÓÚ ÒÕÙÐØÝ Ò Ø ÒØÓÒ Ó Í º ÁÒ ØÖÑ Ó ÑÜÒ ØÑ Û Ú ÓÖÓÐÐÖÝ º½º½ ÐÓ ÑÒ µ ÐÓ ÑÜ ÈÖÓÓ ÓÖ Å Û Ú Ô Ø µ µ Ø ÑÜ µ Ø ÑÜ ÑÒ ÑÒ Ò ÜÑÔÐ Û ÓÒ Ö ÖÒÓÑ ÛÐ Ï Ò ÓÒ Ø ÙÒØ ÝÔÖÙº ÀÖ Ø ÖÔ Ø Ò¹Ù É Ò Ò ¼ ½ Ò Ò µ ÛÖ Ü Ý Ò Ö ÒØ Ò É Ò ØÖ ÀÑÑÒ ØÒ ÓÒ ºº Ò Ü Ý ½º Ï Ò Ð ÐÓÓÔ ØÓ ÚÖØÜ ØÓ Ñ Ø Ò ÐÞݺ Á ¹ÖÙÐÖ ÖÔ ÛØÓÙØ ÐÓÓÔ Ò Ø ÒÝ ÑØÖÜ ØÒ Ø ÔÖÓ¹ ÐØÝ ØÖÒ ØÓÒ ÑØÖÜ È Ó ÖÒÓÑ ÛÐ ÓÒ Ø È ½ º ÓÖ ÖÔ Î µ ½ Û Ò Ò ØÖ ÔÖÓÙØ ½ Î µ ÛÖ Î Î ½ Î Ò Ú ½ Ú µ Û ½ Û µµ Ú ½ Û ½ Ò Ú Û µ ÓÖ Ú Û Ò Ú ½ Û ½ µ ½ º ÌÒ É Ò Ã Ã Ã Ò ÓÐ ÔÖÓÙص ºµ

23 ºº ÇÆÍÌÆ ½ ÌÓÖÑ º½º Á ½ Ñ Ò ½ Ò Ö Ø ÒÚÐÙ Ó Ñ¹ ØÖ ½ Òº ÈÖÓÓ Ö ÔØÚÐÝ ØÒ Ø ÒÚÐÙ Ó Ö ½ Ñ ½ Ò ÓØÒ ÖÓÑ ½ Ý ÖÔÐÒ ½ Ý Ø Î ÒØØÝ ÑØÖÜ Á Ø Ó«¹ÓÒÐ ¼³ Ý Ø Î Î ÑØÖÜ Ó ¼³ Ò ÖÔÐÒ ÓÒÐ ÒØÖÝ Ý º ËÓ Ô µ Ø Á µ ØÒ Ô µ Ø Ô ½ Á µ Ì ÓÐÐÓÛ ÖÓÑ Ø ÓÐÐÓÛÒ ËÙÔÔÓ Ø ÑÒ ÑÒ ÑØÖÜ ÓÑÔÓ ÒØÓ Ò Ñ Ñ ÑØÖÜ Ó Ò Ò ÐÓ º ËÙÔÔÓ Ð Ó ØØ Ø ÓÑÑÙØ ÑÓÒ ØÑ ÐÚ º ÌÒ Ø Ø ½µ Ò µ Ñ ½ µ ºº ÓÒ Ò ÔÖÓÙ Ò Ñ Ñ ÑØÖÜ Ý ØÖÑÒÒØ ÐÙÐØÓÒ Ò ØÒ Ø Ø ØÖÑÒÒغ Æ ÔÖÓÓ ËÓ Ô µ Ø Ò Á Á µ Ò Ô µ Ò Ò µ ½ ½ ½ ½ Ì ÒÚÐÙ Ó Ã Ö ½ ½ Ò ÔÔÐÝÒ ºµ Û ØØ Ø ÒÚÐÙ Ó É Ò Ö ¼ ½ Ò ÒÓÖÒ ÑÙÐØÔÐØ µº ÌÓ Ø Ø ÒÚÐÙ ÓÖ ÓÙÖ ÖÒÓÑ ÛÐ Û µ Ú Ý Ò Ò ØÒ µ ÖÔÐ ÒÚÐÙ Ý ½ ØÓ ÓÙÒØ ÓÖ Ò ÐÓÓÔ º ÌÙ Ø ÓÒ ÒÚÐÙ Ó Ø ÛÐ ½ ½ º Ò ÔÔÐÝÒ ÓÖÓÐÐÖÝ º½º½ Û ÓØÒ µ ÐÓ ½ µ Ç Ò µº Ì ÔÓÓÖ ØÑØ Ù ØÓ ÓÙÖ Ù Ó Ø ÙݹËÛÖØÞ ÒÕÙÐØÝ Ò Ø ÔÖÓÓ Ó ÌÓÖÑ º½º½º Ï Ø Ò Ö Ò ØØÖ ØÑØ Ý Ù Ò ÓÙÔÐÒº º½º½ º ÓÑÔÓ ØÓÒ ÌÓÖÑ ÓÒÙØÒ Ì ÓÒÙØÒ Ó Å Ò Ý ÑÒ Ë Ë Å ¼ ˵ ½ ÛÖ É µ µè µ Ò Ë Å Ò Ë Ë Ëµ ½ É Ë Ëµ

24 ¼ ÀÈÌÊ º ÇÍÆÁÆ ÌÀ ÅÁÁÆ ÌÁÅ ÌÙ Ë Ø ØÝ ØØ ÔÖÓÐØÝ Ó ÑÓÚÒ ÖÓÑ Ë ØÓ Ë Ò ÓÒ ØÔ Ó Ø Ò ÓÒØÓÒÐ ÓÒ Ò Ò Ëº ÐÖÐÝ ½ Å ÐÞݺ ÆÓØ ØØ Ë Ëµ É Ë Ëµ É Ë Ëµ Ë Ëµ ÁÒ É Ë Ëµ É Å Ëµ É Ë Ëµ ˵ É Ë Ëµ É Ë Ëµ ÄØ ÑÒ ÑÒ µ Å ¼ Ò ÑÜ ÑÜ µ ź º µ ºº½ ÊÚÖ Ð Ò ÁÒ Ø ØÓÒ Û ÓÛ ÓÛ ÓÒÙØÒ Ú Ù Ò ØÑØ Ó Ø ÔØÖÐ Ô Ó ÖÚÖ Ð Òº ÄÑÑ ºº½ Á Å ÐÞÝ Ò ÖÓ ØÒ ÐÐ ÒÚÐÙ Ö ÔÓ ØÚº ÈÖÓÓ É È Á ¼ ØÓ Ø Ò ÒÚÐÙ ½ ¼ ½ Æ ½º Ì Ö ÙÐØ ÓÐÐÓÛ ÖÓÑ ½ ¼ ½ Æ ½º ÓÖ Ý Ê Æ ÐØ Ý Ýµ È Ý Ý µ ÄÑÑ ºº Á Å ÖÚÖ Ð ØÒ Ý Ýµ ½ ½ ÑÒ È Ì Ý¼ Ý ÈÖÓÓ ÄØ Ë ¼µ Ò ËØÓÒ º½º ÌÒ Ý Ø ÊÝÐ ÔÖÒÔÐ ½ ÑÜ Ì ½ ܼ Ü Ì ½ È ½ Ü Ü Ì Ü ÌÙ ½ ½ Ü Ì ½ Á È µ ½ Ü ÑÒ Ì ½ ܼ Ü Ì Ü ÑÒ Ì Ý¼ Ý Ì Á È µý Ý Ì Ý ºµ

25 ºº ÇÆÍÌÆ ½ ÆÓÛ Ý Ì Á È µý Ý Ý È ½ È µý Ý Ý È È Ý Ý µ È Ý Ý Ò Ø ÐÑÑ ÓÐÐÓÛ ÖÓÑ ºµº Ý Ýµ ÌÓÖÑ ºº½ Á Å ÖÚÖ Ð Ò ØÒ ½ ½ ÈÖÓÓ ÙÑ ÒÓÛ ØØ Ì Ý ¼ Ý ½ Ý Ý Æ Ò ØØ ÄØ Þ Ý Ý Ö ÓÖ ½ Òº ÌÒ ½ Ö ½ ½ ½ Ö Þ ½ Þ Þ Ö ¼ Þ Ö ½ Þ Æ Ò Ý Ýµ È Ý Þ Þµ ÝÖ È Þ Þ Þµ È Þ È È Þ Þ µ È È Þ Þ µ È Þ µ È È Þ Þ µ ºµ Ý ÆÓÛ È Þ Þ Þ Þ µ Ý ÙݹËÛÖØÞ È ½ Þ ½ Þ ºµ

26 ÀÈÌÊ º ÇÍÆÁÆ ÌÀ ÅÁÁÆ ÌÁÅ Ï ÚÖÝ ºµ ÐØÖº Ð Ó È Þ Þ µ È Þ Þ µ Þ ËÓ Ý Ýµ È Ý È È È ½ Þ ½ Þ È Þ µ ÆÓÛ ÐØ Ë ½ Ò µ º ÌÒ È Ò Þ Ö ¼º ½ ÌÙ Ì Ý ¼ ØÒ Þ ½ Þ Ò ÌÓÖÑ ºº½ ÓÐÐÓÛ º ÈÖÓÓ Ó ºµ Ï ÓÛ ØØ ØÒ Æ ½ ½ Ö Þ ½ Þ ½ ½ Æ ½ ½ Æ ½ µ È Þ Þ ½µ Ë µ Æ ½ Ö Þ Þ ½µ Ë µ Þ Æ Þ Öµ Þ Ý Ýµ È Ý Þ Þ Þ Þ µ ½ Þ ½ Þ Þ ½ Þ µ ½ Ë µµ ºµ Á Ö ½ ºº Þ Þ Ú Ø Ñ Ò ØÒ ÄÀË ºµÊÀË ºµÞ Þ º ÇØÖÛ ÄÀË ºµ Þ Þ µ Ò ÊÀË ºµÞ Þ º ÁÒ ØÖÑ Ó ÑÜÒ ØÑ Û ÓØÒ ÖÓÑ ÓÖÓÐÐÖÝ º½º½ ÓÖÓÐÐÖÝ ºº½ Á Å ÐÞÝ ÖÓ Ò ØÒ µ ÐÓ ÑÒ

27 ºº ÇÆÍÌÆ ÈÖÓÓ ÄÑÑ ºº½ ÑÔÐ ØØ ½ ÑÜ Ò ØÒ ½ ÐÓ ÑÜ ½ ½ ÐÓ ½ µ ½ ÆÓÛ ÓÒ Ö Ø ÓÒÙØÒ Ó ÖÒÓÑ ÛÐ ÓÒ ÖÔ Î µº ÓÖ Ë Ì Î ÐØ Ë Ì µ Ú Ûµ Ú Ë Û Ì Ò Ë Ì µ Ë Ì µº ÌÒ Ý ÒØÓÒ ÚÛµ Ë˵ Ë ÚË ÁÒ ÔÖØÙÐÖ ÛÒ Ò Ö¹ÖÙÐÖ ÖÔ Ú Ú ½ Ú Ë Ëµ Ú ÚË Ö ½ Ë Ëµ ÑÒ Ë ½ Ë Î ºµ Ì ÑÒÑÒ ÓÚ ÖÖÖ ØÓ Ø ÜÔÒ ÓÒ Ó º Ì ÖÔ ÛØ ÓÓ ÜÔÒ ÓÒ ÜÔÒÖ ÖÔ µ Ú ÐÖ ÓÒÙØÒ Ò ÖÒÓÑ ÛÐ ÓÒ ØÑ ÑÜ ÖÔÐݺ Ò ÜÑÔÐ ÓÒ Ö Ø Ò¹Ù É Ò º ÓÖ Ë Ò ÐØ Ò Ëµ ÒÓØ Ø ÒÙÑÖ Ó Ó É Ò Û Ö ÛÓÐÐÝ ÓÒØÒ Ò Ëº ÄÑÑ ºº Á Ë Ò ØÒ Ò Ëµ ½ Ë ÐÓ Ëº ÈÖÓÓ Ï ÔÖÓÚ Ø Ý ÒÙØÓÒ ÓÒ Òº ÁØ ØÖÚÐ ÓÖ Ò ½º ÓÖ Ò ½ ÐØ Ë Ü Ë Ü Ò ÓÖ ½ º ÌÒ Ò Ëµ Ò Ë ¼ µ Ò Ë ½ µ ÑÒË ¼ Ë ½ Ò Ø ØÖÑ ÑÒË ¼ Ë ½ ÓÙÒ Ø ÒÙÑÖ Ó Û Ö ÓÒØÒ Ò Ë Ò ÓÒ Ë ¼ Ë ½ º Ì ÐÑÑ ÓÐÐÓÛ ÖÓÑ Ø ÒÕÙÐØÝ Ü ÐÓ Ü Ý ÐÓ Ý Ý Ü Ýµ ÐÓ Ü Ýµ ÓÖ ÐÐ Ü Ý ¼º Ì ÔÖÓÓ ÐØ ÑÔÐ ÜÖ Ò ÐÙÐÙ º Ý ÙÑÑÒ Ø Ö Ø ÚÖØÜ Ó Ë Û ØØ Ý Ø ÓÚ ÐÑÑ Û Ú Ë Ëµ Ò Ëµ ÒË Ë Ëµ ÒË ½ Ë ÐÓ Ë Ë

µ(, y) Computing the Möbius fun tion µ(x, x) = 1 The Möbius fun tion is de ned b y and X µ(x, t) = 0 x < y if x6t6y 3

µ(, y) Computing the Möbius fun tion µ(x, x) = 1 The Möbius fun tion is de ned b y and X µ(x, t) = 0 x < y if x6t6y 3 ÈÖÑÙØØÓÒ ÔØØÖÒ Ò Ø ÅÙ ÙÒØÓÒ ÙÖ ØÒ ÎØ ÂÐÒ Ú ÂÐÒÓÚ Ò ÐÜ ËØÒÖÑ ÓÒ ÒÖ Ì ØÛÓµ 2314 ½¾ ½ ¾ ¾½ ¾ ½ ½¾ ¾½ ½¾ ¾½ ½ Ì ÔÓ Ø Ó ÔÖÑÙØØÓÒ ÛºÖºØº ÔØØÖÒ ÓÒØÒÑÒØ ½ 2314 ½¾ ½ ¾ ¾½ ¾ ½ ½¾ ¾½ ½¾ ¾½ Ì ÒØÖÚÐ [12,2314] ½ ¾ ÓÑÔÙØÒ

More information

ÇÙÐ Ò ½º ÅÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ Ò Ú Ö Ð Ú Ö Ð ¾º Ä Ò Ö Ö Ù Ð Ý Ó ËÝÑ ÒÞ ÔÓÐÝÒÓÑ Ð º Ì ÛÓ¹ÐÓÓÔ ÙÒÖ Ö Ô Û Ö Ö ÖÝ Ñ ¹ ÝÓÒ ÑÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ

ÇÙÐ Ò ½º ÅÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ Ò Ú Ö Ð Ú Ö Ð ¾º Ä Ò Ö Ö Ù Ð Ý Ó ËÝÑ ÒÞ ÔÓÐÝÒÓÑ Ð º Ì ÛÓ¹ÐÓÓÔ ÙÒÖ Ö Ô Û Ö Ö ÖÝ Ñ ¹ ÝÓÒ ÑÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ ÅÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ Ò ÝÒÑ Ò Ò Ö Ð Ö Ò Ó Ò Ö ÀÍ ÖÐ Òµ Ó Ò ÛÓÖ Û Ö Ò ÖÓÛÒ Ö Ú ½ ¼¾º ¾½ Û Åº Ä Ö Ö Ú ½ ¼¾º ¼¼ Û Äº Ñ Ò Ëº Ï ÒÞ ÖÐ Å ÒÞ ½ º¼ º¾¼½ ÇÙÐ Ò ½º ÅÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ Ò Ú Ö Ð Ú Ö Ð ¾º Ä Ò Ö Ö Ù Ð Ý Ó ËÝÑ

More information

MARKOV CHAINS AND HIDDEN MARKOV MODELS

MARKOV CHAINS AND HIDDEN MARKOV MODELS MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe

More information

ÁÒØÖÓÙØÓÒ ÈÖÓÐÑ ØØÑÒØ ÓÚÖÒ¹ ÐØÖ Ò ËÑÙÐØÓÒ Ê ÙÐØ ÓÒÐÙ ÓÒ ÁÒÜ ½ ÁÒØÖÓÙØÓÒ ¾ ÈÖÓÐÑ ØØÑÒØ ÓÚÖÒ¹ ÐØÖ Ò ËÑÙÐØÓÒ Ê ÙÐØ ÓÒÐÙ ÓÒ

ÁÒØÖÓÙØÓÒ ÈÖÓÐÑ ØØÑÒØ ÓÚÖÒ¹ ÐØÖ Ò ËÑÙÐØÓÒ Ê ÙÐØ ÓÒÐÙ ÓÒ ÁÒÜ ½ ÁÒØÖÓÙØÓÒ ¾ ÈÖÓÐÑ ØØÑÒØ ÓÚÖÒ¹ ÐØÖ Ò ËÑÙÐØÓÒ Ê ÙÐØ ÓÒÐÙ ÓÒ ÁÒØÖÓÙØÓÒ ÈÖÓÐÑ ØØÑÒØ ÓÚÖÒ¹ ÐØÖ Ò ËÑÙÐØÓÒ Ê ÙÐØ ÓÒÐÙ ÓÒ ÙÑÔ ÐØÖ ÓÖ ÙÒÖØÒ ÝÒÑ Ý ØÑ ÛØ ÖÓÔÓÙØ º ÓÐÞ ½ º º ÉÙÚÓ ¾ Áº ÈÖÖÓ ½ ʺ ËÒ ½ ½ ÔÖØÑÒØ Ó ÁÒÙ ØÖÐ ËÝ ØÑ ÒÒÖÒ Ò Ò ÍÒÚÖ ØØ ÂÙÑ Á ØÐÐ ËÔÒ ¾ ËÓÓÐ Ó ÐØÖÐ ÒÒÖÒ

More information

F(jω) = a(jω p 1 )(jω p 2 ) Û Ö p i = b± b 2 4ac. ω c = Y X (jω) = 1. 6R 2 C 2 (jω) 2 +7RCjω+1. 1 (6jωRC+1)(jωRC+1) RC, 1. RC = p 1, p

F(jω) = a(jω p 1 )(jω p 2 ) Û Ö p i = b± b 2 4ac. ω c = Y X (jω) = 1. 6R 2 C 2 (jω) 2 +7RCjω+1. 1 (6jωRC+1)(jωRC+1) RC, 1. RC = p 1, p ÓÖ Ò ÊÄ Ò Ò Û Ò Ò Ö Ý ½¾ Ù Ö ÓÖ ÖÓÑ Ö ÓÒ Ò ÄÈ ÐØ Ö ½¾ ½¾ ½» ½½ ÓÖ Ò ÊÄ Ò Ò Û Ò Ò Ö Ý ¾ Á b 2 < 4ac Û ÒÒÓØ ÓÖ Þ Û Ö Ð Ó ÒØ Ó Û Ð Ú ÕÙ Ö º ËÓÑ Ñ ÐÐ ÕÙ Ö Ö ÓÒ Ò º Ù Ö ÓÖ ½¾ ÓÖ Ù Ö ÕÙ Ö ÓÖ Ò ØÖ Ò Ö ÙÒØ ÓÒ

More information

Approximate Counting and Markov Chain Monte Carlo

Approximate Counting and Markov Chain Monte Carlo Approximate Counting and Markov Chain Monte Carlo A Randomized Approach Arindam Pal Department of Computer Science and Engineering Indian Institute of Technology Delhi March 18, 2011 April 8, 2011 Arindam

More information

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS ARI FREEDMAN Abstract. In this expository paper, I will give an overview of the necessary conditions for convergence in Markov chains on finite state spaces.

More information

The coupling method - Simons Counting Complexity Bootcamp, 2016

The coupling method - Simons Counting Complexity Bootcamp, 2016 The coupling method - Simons Counting Complexity Bootcamp, 2016 Nayantara Bhatnagar (University of Delaware) Ivona Bezáková (Rochester Institute of Technology) January 26, 2016 Techniques for bounding

More information

Convergence Rate of Markov Chains

Convergence Rate of Markov Chains Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution

More information

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states. Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring

More information

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that 15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that

More information

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Lecture 16: Modern Classification (I) - Separating Hyperplanes Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are

More information

Coupling AMS Short Course

Coupling AMS Short Course Coupling AMS Short Course January 2010 Distance If µ and ν are two probability distributions on a set Ω, then the total variation distance between µ and ν is Example. Let Ω = {0, 1}, and set Then d TV

More information

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University Markov Chains Andreas Klappenecker Texas A&M University 208 by Andreas Klappenecker. All rights reserved. / 58 Stochastic Processes A stochastic process X tx ptq: t P T u is a collection of random variables.

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Lecture 28: April 26

Lecture 28: April 26 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 28: April 26 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

MARKOV CHAINS AND MIXING TIMES

MARKOV CHAINS AND MIXING TIMES MARKOV CHAINS AND MIXING TIMES BEAU DABBS Abstract. This paper introduces the idea of a Markov chain, a random process which is independent of all states but its current one. We analyse some basic properties

More information

7.1 Coupling from the Past

7.1 Coupling from the Past Georgia Tech Fall 2006 Markov Chain Monte Carlo Methods Lecture 7: September 12, 2006 Coupling from the Past Eric Vigoda 7.1 Coupling from the Past 7.1.1 Introduction We saw in the last lecture how Markov

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods University of Chicago Autumn 2003 CS37101-1 Markov Chain Monte Carlo Methods Lecture 4: October 21, 2003 Bounding the mixing time via coupling Eric Vigoda 4.1 Introduction In this lecture we ll use the

More information

Proving observational equivalence with ProVerif

Proving observational equivalence with ProVerif Proving observational equivalence with ProVerif Bruno Blanchet INRIA Paris-Rocquencourt Bruno.Blanchet@inria.fr based on joint work with Martín Abadi and Cédric Fournet and with Vincent Cheval June 2015

More information

A Language for Task Orchestration and its Semantic Properties

A Language for Task Orchestration and its Semantic Properties DEPARTMENT OF COMPUTER SCIENCES A Language for Task Orchestration and its Semantic Properties David Kitchin, William Cook and Jayadev Misra Department of Computer Science University of Texas at Austin

More information

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution. Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

The Markov Chain Monte Carlo Method

The Markov Chain Monte Carlo Method The Markov Chain Monte Carlo Method Idea: define an ergodic Markov chain whose stationary distribution is the desired probability distribution. Let X 0, X 1, X 2,..., X n be the run of the chain. The Markov

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Radu Alexandru GHERGHESCU, Dorin POENARU and Walter GREINER

Radu Alexandru GHERGHESCU, Dorin POENARU and Walter GREINER È Ö Ò Ò Ù Ò Ò Ò ÖÝ ÒÙÐ Ö Ý Ø Ñ Radu Alexandru GHERGHESCU, Dorin POENARU and Walter GREINER Radu.Gherghescu@nipne.ro IFIN-HH, Bucharest-Magurele, Romania and Frankfurt Institute for Advanced Studies, J

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Lecture 3: September 10

Lecture 3: September 10 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 3: September 10 Lecturer: Prof. Alistair Sinclair Scribes: Andrew H. Chan, Piyush Srivastava Disclaimer: These notes have not

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics

Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics STATISTICAL METHODS FOR SAFETY ANALYSIS FMS065 ÓÑÔÙØ Ö Ü Ö Ì ÓÓØ ØÖ Ô Ð ÓÖ Ø Ñ Ò Ý Ò Ò ÐÝ In this exercise we will

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Lecture 5: Random Walks and Markov Chain

Lecture 5: Random Walks and Markov Chain Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables

More information

Self-Testing Polynomial Functions Efficiently and over Rational Domains

Self-Testing Polynomial Functions Efficiently and over Rational Domains Chapter 1 Self-Testing Polynomial Functions Efficiently and over Rational Domains Ronitt Rubinfeld Madhu Sudan Ý Abstract In this paper we give the first self-testers and checkers for polynomials over

More information

Coupling. 2/3/2010 and 2/5/2010

Coupling. 2/3/2010 and 2/5/2010 Coupling 2/3/2010 and 2/5/2010 1 Introduction Consider the move to middle shuffle where a card from the top is placed uniformly at random at a position in the deck. It is easy to see that this Markov Chain

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods p. /36 Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Markov Chain Monte Carlo Methods p. 2/36 Markov Chains

More information

PH Nuclear Physics Laboratory Gamma spectroscopy (NP3)

PH Nuclear Physics Laboratory Gamma spectroscopy (NP3) Physics Department Royal Holloway University of London PH2510 - Nuclear Physics Laboratory Gamma spectroscopy (NP3) 1 Objectives The aim of this experiment is to demonstrate how γ-ray energy spectra may

More information

SKMM 3023 Applied Numerical Methods

SKMM 3023 Applied Numerical Methods SKMM 3023 Applied Numerical Methods Solution of Nonlinear Equations ibn Abdullah Faculty of Mechanical Engineering Òº ÙÐÐ ÚºÒÙÐÐ ¾¼½ SKMM 3023 Applied Numerical Methods Solution of Nonlinear Equations

More information

1.3 Convergence of Regular Markov Chains

1.3 Convergence of Regular Markov Chains Markov Chains and Random Walks on Graphs 3 Applying the same argument to A T, which has the same λ 0 as A, yields the row sum bounds Corollary 0 Let P 0 be the transition matrix of a regular Markov chain

More information

COUPLING TIMES FOR RANDOM WALKS WITH INTERNAL STATES

COUPLING TIMES FOR RANDOM WALKS WITH INTERNAL STATES COUPLING TIMES FOR RANDOM WALKS WITH INTERNAL STATES ELYSE AZORR, SAMUEL J. GHITELMAN, RALPH MORRISON, AND GREG RICE ADVISOR: YEVGENIY KOVCHEGOV OREGON STATE UNIVERSITY ABSTRACT. Using coupling techniques,

More information

Sampling Good Motifs with Markov Chains

Sampling Good Motifs with Markov Chains Sampling Good Motifs with Markov Chains Chris Peikert December 10, 2004 Abstract Markov chain Monte Carlo (MCMC) techniques have been used with some success in bioinformatics [LAB + 93]. However, these

More information

SME 3023 Applied Numerical Methods

SME 3023 Applied Numerical Methods UNIVERSITI TEKNOLOGI MALAYSIA SME 3023 Applied Numerical Methods Solution of Nonlinear Equations Abu Hasan Abdullah Faculty of Mechanical Engineering Sept 2012 Abu Hasan Abdullah (FME) SME 3023 Applied

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Applications of Discrete Mathematics to the Analysis of Algorithms

Applications of Discrete Mathematics to the Analysis of Algorithms Applications of Discrete Mathematics to the Analysis of Algorithms Conrado Martínez Univ. Politècnica de Catalunya, Spain May 2007 Goal Given some algorithm taking inputs from some set Á, we would like

More information

MARKOV CHAINS AND COUPLING FROM THE PAST

MARKOV CHAINS AND COUPLING FROM THE PAST MARKOV CHAINS AND COUPLING FROM THE PAST DYLAN CORDARO Abstract We aim to explore Coupling from the Past (CFTP), an algorithm designed to obtain a perfect sampling from the stationary distribution of a

More information

An Example file... log.txt

An Example file... log.txt # ' ' Start of fie & %$ " 1 - : 5? ;., B - ( * * B - ( * * F I / 0. )- +, * ( ) 8 8 7 /. 6 )- +, 5 5 3 2( 7 7 +, 6 6 9( 3 5( ) 7-0 +, => - +< ( ) )- +, 7 / +, 5 9 (. 6 )- 0 * D>. C )- +, (A :, C 0 )- +,

More information

Arbeitstagung: Gruppen und Topologische Gruppen Vienna July 6 July 7, Abstracts

Arbeitstagung: Gruppen und Topologische Gruppen Vienna July 6 July 7, Abstracts Arbeitstagung: Gruppen und Topologische Gruppen Vienna July 6 July 7, 202 Abstracts ÁÒÚ Ö Ð Ñ Ø Ó Ø¹Ú ÐÙ ÙÒØ ÓÒ ÁÞØÓ Ò ÞØÓ º Ò ÙÒ ¹Ñ º ÙÐØÝ Ó Æ ØÙÖ Ð Ë Ò Ò Å Ø Ñ Ø ÍÒ Ú Ö ØÝ Ó Å Ö ÓÖ ÃÓÖÓ ½ ¼ Å Ö ÓÖ ¾¼¼¼

More information

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505 INTRODUCTION TO MCMC AND PAGERANK Eric Vigoda Georgia Tech Lecture for CS 6505 1 MARKOV CHAIN BASICS 2 ERGODICITY 3 WHAT IS THE STATIONARY DISTRIBUTION? 4 PAGERANK 5 MIXING TIME 6 PREVIEW OF FURTHER TOPICS

More information

MONOTONE COUPLING AND THE ISING MODEL

MONOTONE COUPLING AND THE ISING MODEL MONOTONE COUPLING AND THE ISING MODEL 1. PERFECT MATCHING IN BIPARTITE GRAPHS Definition 1. A bipartite graph is a graph G = (V, E) whose vertex set V can be partitioned into two disjoint set V I, V O

More information

Model Counting for Logical Theories

Model Counting for Logical Theories Model Counting for Logical Theories Wednesday Dmitry Chistikov Rayna Dimitrova Department of Computer Science University of Oxford, UK Max Planck Institute for Software Systems (MPI-SWS) Kaiserslautern

More information

Flip dynamics on canonical cut and project tilings

Flip dynamics on canonical cut and project tilings Flip dynamics on canonical cut and project tilings Thomas Fernique CNRS & Univ. Paris 13 M2 Pavages ENS Lyon November 5, 2015 Outline 1 Random tilings 2 Random sampling 3 Mixing time 4 Slow cooling Outline

More information

Some Definition and Example of Markov Chain

Some Definition and Example of Markov Chain Some Definition and Example of Markov Chain Bowen Dai The Ohio State University April 5 th 2016 Introduction Definition and Notation Simple example of Markov Chain Aim Have some taste of Markov Chain and

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim Course on Information Systems 2, summer term 2010 0/29 Information Systems 2 Information Systems 2 5. Business Process Modelling I: Models Lars Schmidt-Thieme Information Systems and Machine Learning Lab

More information

x 0, x 1,...,x n f(x) p n (x) = f[x 0, x 1,..., x n, x]w n (x),

x 0, x 1,...,x n f(x) p n (x) = f[x 0, x 1,..., x n, x]w n (x), ÛÜØ Þ ÜÒ Ô ÚÜ Ô Ü Ñ Ü Ô Ð Ñ Ü ÜØ º½ ÞÜ Ò f Ø ÚÜ ÚÛÔ Ø Ü Ö ºÞ ÜÒ Ô ÚÜ Ô Ð Ü Ð Þ Õ Ô ÞØÔ ÛÜØ Ü ÚÛÔ Ø Ü Ö L(f) = f(x)dx ÚÜ Ô Ü ÜØ Þ Ü Ô, b] Ö Û Þ Ü Ô Ñ ÒÖØ k Ü f Ñ Df(x) = f (x) ÐÖ D Ü Ü ÜØ Þ Ü Ô Ñ Ü ÜØ Ñ

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

ÆÓÒ¹ÒØÖÐ ËÒÐØ ÓÙÒÖÝ

ÆÓÒ¹ÒØÖÐ ËÒÐØ ÓÙÒÖÝ ÁÒØÖÐ ÓÙÒÖ Ò Ë»Ì Î ÊÐ ÔÖØÑÒØ Ó ÅØÑØ ÍÒÚÖ ØÝ Ó ÓÖ Á̳½½ ØÝ ÍÒÚÖ ØÝ ÄÓÒÓÒ ÔÖÐ ½ ¾¼½½ ÆÓÒ¹ÒØÖÐ ËÒÐØ ÓÙÒÖÝ ÇÙØÐÒ ËÙÔÖ ØÖÒ Ò Ë»Ì Ì ØÙÔ ÏÓÖÐ Ø Ë¹ÑØÖÜ ÍÒÖÐÝÒ ÝÑÑØÖ ÁÒØÖÐ ÓÙÒÖ ÁÒØÖÐØÝ Ø Ø ÓÙÒÖÝ» ÖÒ Ò ØÛ Ø ÒÒ Ú»Ú

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Faithful couplings of Markov chains: now equals forever

Faithful couplings of Markov chains: now equals forever Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Introduction to Markov Chains and Riffle Shuffling

Introduction to Markov Chains and Riffle Shuffling Introduction to Markov Chains and Riffle Shuffling Nina Kuklisova Math REU 202 University of Chicago September 27, 202 Abstract In this paper, we introduce Markov Chains and their basic properties, and

More information

2 Hallén s integral equation for the thin wire dipole antenna

2 Hallén s integral equation for the thin wire dipole antenna Ú Ð Ð ÓÒÐ Ò Ø ØØÔ»» Ѻ Ö Ùº º Ö ÁÒغ º ÁÒ Ù ØÖ Ð Å Ø Ñ Ø ÎÓк ÆÓº ¾ ¾¼½½µ ½ ¹½ ¾ ÆÙÑ Ö Ð Ñ Ø Ó ÓÖ Ò ÐÝ Ó Ö Ø ÓÒ ÖÓÑ Ø Ò Û Ö ÔÓÐ ÒØ ÒÒ Ëº À Ø ÑÞ ¹Î ÖÑ ÞÝ Ö Åº Æ Ö¹ÅÓ Êº Ë Þ ¹Ë Ò µ Ô ÖØÑ ÒØ Ó Ð ØÖ Ð Ò Ò

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

Markov Processes Hamid R. Rabiee

Markov Processes Hamid R. Rabiee Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete

More information

A Note on the Glauber Dynamics for Sampling Independent Sets

A Note on the Glauber Dynamics for Sampling Independent Sets A Note on the Glauber Dynamics for Sampling Independent Sets Eric Vigoda Division of Informatics King s Buildings University of Edinburgh Edinburgh EH9 3JZ vigoda@dcs.ed.ac.uk Submitted: September 25,

More information

Probability & Computing

Probability & Computing Probability & Computing Stochastic Process time t {X t t 2 T } state space Ω X t 2 state x 2 discrete time: T is countable T = {0,, 2,...} discrete space: Ω is finite or countably infinite X 0,X,X 2,...

More information

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505 INTRODUCTION TO MCMC AND PAGERANK Eric Vigoda Georgia Tech Lecture for CS 6505 1 MARKOV CHAIN BASICS 2 ERGODICITY 3 WHAT IS THE STATIONARY DISTRIBUTION? 4 PAGERANK 5 MIXING TIME 6 PREVIEW OF FURTHER TOPICS

More information

CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN. Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren

CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN. Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren CESAME, Université catholique de Louvain Bâtiment Euler, Avenue G. Lemaître 4-6 B-1348 Louvain-la-Neuve,

More information

j j ( ϕ j ) p (dd c ϕ j ) n < (dd c ϕ j ) n <.

j j ( ϕ j ) p (dd c ϕ j ) n < (dd c ϕ j ) n <. ÆÆÄË ÈÇÄÇÆÁÁ ÅÌÀÅÌÁÁ ½º¾ ¾¼¼µ ÓÒÖÒÒ Ø ÒÖÝ Ð E p ÓÖ 0 < p < 1 Ý ÈÖ ËÙÒ ÚÐе Ê ÞÝ ÃÖÛµ Ò È º Ñ ÀÓÒ À º Ô ÀÒÓµ ØÖغ Ì ÒÖÝ Ð E p ØÙ ÓÖ 0 < p < 1º ÖØÖÞØÓÒ Ó Ö¹ ØÒ ÓÙÒ ÔÐÙÖ ÙÖÑÓÒ ÙÒØÓÒ Ò ØÖÑ Ó F p Ò Ø ÔÐÙÖÓÑÔÐÜ

More information

Ch5. Markov Chain Monte Carlo

Ch5. Markov Chain Monte Carlo ST4231, Semester I, 2003-2004 Ch5. Markov Chain Monte Carlo In general, it is very difficult to simulate the value of a random vector X whose component random variables are dependent. In this chapter we

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

«Û +(2 )Û, the total charge of the EH-pair is at most «Û +(2 )Û +(1+ )Û ¼, and thus the charging ratio is at most

«Û +(2 )Û, the total charge of the EH-pair is at most «Û +(2 )Û +(1+ )Û ¼, and thus the charging ratio is at most ÁÑÔÖÓÚ ÇÒÐÒ ÐÓÖØÑ ÓÖ Ù«Ö ÅÒÑÒØ Ò ÉÓË ËÛØ ÅÖ ÖÓ ÏÓ ÂÛÓÖ ÂÖ ËÐÐ Ý ÌÓÑ ÌÝ Ý ØÖØ We consider the following buffer management problem arising in QoS networks: packets with specified weights and deadlines arrive

More information

Lecture 6: September 22

Lecture 6: September 22 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 6: September 22 Lecturer: Prof. Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected

More information

Lecture 8: Path Technology

Lecture 8: Path Technology Counting and Sampling Fall 07 Lecture 8: Path Technology Lecturer: Shayan Oveis Gharan October 0 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

The Monte Carlo Method

The Monte Carlo Method The Monte Carlo Method Example: estimate the value of π. Choose X and Y independently and uniformly at random in [0, 1]. Let Pr(Z = 1) = π 4. 4E[Z] = π. { 1 if X Z = 2 + Y 2 1, 0 otherwise, Let Z 1,...,

More information

1 Stat 605. Homework I. Due Feb. 1, 2011

1 Stat 605. Homework I. Due Feb. 1, 2011 The first part is homework which you need to turn in. The second part is exercises that will not be graded, but you need to turn it in together with the take-home final exam. 1 Stat 605. Homework I. Due

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Essentials on the Analysis of Randomized Algorithms

Essentials on the Analysis of Randomized Algorithms Essentials on the Analysis of Randomized Algorithms Dimitris Diochnos Feb 0, 2009 Abstract These notes were written with Monte Carlo algorithms primarily in mind. Topics covered are basic (discrete) random

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE International Journal of Applied Mathematics Volume 31 No. 18, 41-49 ISSN: 1311-178 (printed version); ISSN: 1314-86 (on-line version) doi: http://dx.doi.org/1.173/ijam.v31i.6 LIMITING PROBABILITY TRANSITION

More information

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature. Markov Chains Markov chains: 2SAT: Powerful tool for sampling from complicated distributions rely only on local moves to explore state space. Many use Markov chains to model events that arise in nature.

More information

Lecture 2: September 8

Lecture 2: September 8 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 2: September 8 Lecturer: Prof. Alistair Sinclair Scribes: Anand Bhaskar and Anindya De Disclaimer: These notes have not been

More information

Applied Stochastic Processes

Applied Stochastic Processes Applied Stochastic Processes Jochen Geiger last update: July 18, 2007) Contents 1 Discrete Markov chains........................................ 1 1.1 Basic properties and examples................................

More information

Frequency domain representation and singular value decomposition

Frequency domain representation and singular value decomposition EOLSS Contribution 643134 Frequency domain representation and singular value decomposition AC Antoulas Department of Electrical and Computer Engineering Rice University Houston, Texas 77251-1892, USA e-mail:

More information

A D VA N C E D P R O B A B I L - I T Y

A D VA N C E D P R O B A B I L - I T Y A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

6.842 Randomness and Computation February 24, Lecture 6

6.842 Randomness and Computation February 24, Lecture 6 6.8 Randomness and Computation February, Lecture 6 Lecturer: Ronitt Rubinfeld Scribe: Mutaamba Maasha Outline Random Walks Markov Chains Stationary Distributions Hitting, Cover, Commute times Markov Chains

More information

1 Random Walks and Electrical Networks

1 Random Walks and Electrical Networks CME 305: Discrete Mathematics and Algorithms Random Walks and Electrical Networks Random walks are widely used tools in algorithm design and probabilistic analysis and they have numerous applications.

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity

Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity László Babai Peter G. Kimmel Department of Computer Science The University of Chicago 1100 East 58th Street Chicago,

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Bichain graphs: geometric model and universal graphs

Bichain graphs: geometric model and universal graphs Bichain graphs: geometric model and universal graphs Robert Brignall a,1, Vadim V. Lozin b,, Juraj Stacho b, a Department of Mathematics and Statistics, The Open University, Milton Keynes MK7 6AA, United

More information

Random walks, Markov chains, and how to analyse them

Random walks, Markov chains, and how to analyse them Chapter 12 Random walks, Markov chains, and how to analyse them Today we study random walks on graphs. When the graph is allowed to be directed and weighted, such a walk is also called a Markov Chain.

More information